This README.txt file was generated on 2024-05-23 by L. Butler


--------------------
GENERAL INFORMATION
--------------------

1. Title of Dataset: Open dataset of annual Article Processing Charges (APCs) of gold and hybrid journals published by Elsevier, Frontiers, MDPI, PLOS, Springer-Nature and Wiley 2019-2023 

2. Author Information

	A. Author Contact Information
		Name: Leigh-Ann Butler
		Institution:  University of Ottawa
		Email: leigh-ann.butler@uottawa.ca 

	B. Author Contact Information
		Name: Madelaine Hare
		Institution:  University of Ottawa
		Email: maddie.hare@uottawa.ca

	C. Author Contact Information
		Name: Nina Sch�nfelder
		Institution:  Bielefeld University Library
		Email: nina.schoenfelder@uni-bielefeld.de
	
	D. Author Contact Information
		Name: Eric Schares
		Institution:  Iowa State University
		Email: eschares@iastate.edu	

	E. Author Contact Information
		Name: Juan Pablo Alperin
		Institution:  Simon Fraser University
		Email: juan@alperin.ca	
	
	F. Principal Investigator Contact Information
		Name: Stefanie Haustein
		Institution:  University of Ottawa
		Email: stefanie.haustein@uottawa.ca

3. Collection instrument: 

Manual data collection and web scraping

4. How to cite: 

Butler, L.-A., Hare, M., Sch�nfelder, N., Schares, E., Alperin, J.P., & Haustein, S. (2024). Open dataset of annual Article Processing Charges (APCs) of gold and hybrid journals published by Elsevier, Frontiers, MDPI, PLOS, Springer-Nature and Wiley 2019-2023 (Version v1) [dataset]. Harvard Dataverse. https://doi.org/10.7910/DVN/CR1MMV 

---------------------------
SHARING/ACCESS INFORMATION
---------------------------

Licenses/restrictions placed on the data: 
These data are available under a CC BY 0 license <https://creativecommons.org/public-domain/cc0/> 

---------------------
DATA & FILE OVERVIEW
---------------------

VERSION 1

1. File List

   A. Filename: APCDataset-Codebook-v1.txt  
      Short description: Codebook      

   B. Filename: APCdataset-annualAPCs_Published-v1.txt
      Short description: APC Dataset in format txt

   C. Filename: APCdataset-ConversionRates_Published-v1.txt
      Short description: Rates used for currency conversion

   D. Filename: APCdataset-DataCleaningNotes_Published-v1.txt
      Short description: Data cleaning notes

   E. Filename: APCdataset-JournalLevel_Published-v1.txt
      Short description: Journal level data

2. Additional related data collected that was not included in the current data package: 

Original publisher APC price lists

---------------------------
METHODOLOGICAL INFORMATION
---------------------------

1. Description of methods used for collection/generation of data:

This dataset combines and standardizes data from the annual price lists of article processing charges (APCs) of six large scholarly publishers � Elsevier, Frontiers, PLOS, MDPI, Springer Nature and Wiley � between 2019 and 2023. APC price lists were downloaded from publisher websites each year as well as via Wayback Machine snapshots to retrieve open access publishing fees per journal per year. The dataset includes journal metadata, APC collection method, and annual APC price list information in several currencies (USD, EUR, GBP, CHF, JPY, CAD) for 8,712 unique journals and 36,618 journal-year combinations.

2. Methods for processing the data: 

APC data were downloaded from publisher price lists, typically  in downloadable PDFs, structured XLSX files, or displayed as HTML on publisher websites. When not available in a downloadable format, APCs were scraped from individual web pages using Wayback Machine and collated into one XLSX file for import.

3. Instrument- or software-specific information needed to interpret the data:

The dataset is available in a coded format in tab-delimited txt. The Codebook is required to interpret values.

-----------------------------------------------------------------
DATA-SPECIFIC INFORMATION FOR: APCdataset-annualAPCs_Published-v1
-----------------------------------------------------------------

1. Number of variables: 25

2. Number of cases/rows: 36,618


Refer to APCDataset-Codebook-v1.txt for detailed variable information.

